---
title: Aggregations/Facets
---

<Info>
  Facets/aggregations are a ParadeDB enterprise feature. [Contact
  us](mailto:sales@paradedb.com) for access.
</Info>

<Note>
  **Prerequisite** Aggregations currently work only on [fast
  fields](/search/full-text/index#fast-fields). Ensure that the fields you want
  to aggregate on are configured as fast fields.
</Note>

## Overview

ParadeDB provides powerful aggregation capabilities over text search results using Tantivy's aggregation features. With the `aggregate` function, you can compute various statistical metrics and group data into buckets based on search queries. This is especially useful for deriving insights from your data.

You can specify the aggregation operation and the field on which the operation is performed in a JSON format compatible with Elasticsearch.

### Creating the Test Table and BM25 Index

Before performing any aggregations, you need to create the test table and BM25 index as shown below:

```sql
CALL paradedb.create_bm25_test_table(
    table_name => 'bm25_search',
    schema_name => 'paradedb'
);

CALL paradedb.create_bm25(
    index_name => 'bm25_search',
    table_name => 'bm25_search',
    schema_name => 'paradedb',
    key_field => 'id',
    text_fields => paradedb.field('description'),
    numeric_fields => paradedb.field('rating'),
    boolean_fields => paradedb.field('in_stock'),
    json_fields => paradedb.field('metadata'),
    datetime_fields => paradedb.field('created_at') ||
                       paradedb.field('last_updated_date') ||
                       paradedb.field('latest_available_time')
);
```

### Syntax

```sql
SELECT bm25_search.aggregate('<aggregation_query>');
```

### Example Usage

Below are examples demonstrating the use of the `bm25_search.aggregate` function with different aggregation queries.

#### Average Rating

Calculate the average rating for all documents:

```sql
SELECT bm25_search.aggregate('{
  "avg_rating": {
    "avg": {"field": "rating"}
  }
}');
```

#### Average Rating with Query Argument

Calculate the average rating for documents that match a specific search query. In this example, only documents containing the keyword "keyboard" in their description are considered:

```sql
SELECT bm25_search.aggregate(
  '{
    "avg_rating": {
      "avg": {"field": "rating"}
    }
  }',
  query => paradedb.parse('description:keyboard')
);
```

<Accordion title="Example Aggregation Queries">

```sql
-- Average rating for all documents
SELECT bm25_search.aggregate('{
  "avg_rating": {
    "avg": {"field": "rating"}
  }
}');

-- Average rating for documents containing "keyboard"
SELECT bm25_search.aggregate(
  '{
    "avg_rating": {
      "avg": {"field": "rating"}
    }
  }',
  query => paradedb.parse('description:keyboard')
);
```

</Accordion>

### User-Defined Aggregation Names

In the examples above, the top-level key (`avg_rating`) in the aggregation query is a user-defined name. This name is arbitrary and can be any valid string. It serves as an identifier for the aggregation result, allowing you to reference and distinguish between multiple aggregations in the response.

This naming convention is similar to Elasticsearch's format, where user-defined names are used to identify and organize aggregation results. This flexibility is essential when you have complex aggregation queries involving multiple metrics or nested aggregations.

## Metric Aggregations

### Average

An average aggregation calculates the mean of the specified numeric field values across all documents.

<ParamField body="field">The field name to compute the average on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Sum

A sum aggregation computes the total sum of the specified numeric field values across all documents.

<ParamField body="field">The field name to compute the sum on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Count

A count aggregation tallies the number of values for the specified numeric field across all documents.

<ParamField body="field">The field name to compute the count on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Min

A min aggregation finds the smallest value for the specified numeric field across all documents.

<ParamField body="field">The field name to compute the minimum on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Max

A max aggregation finds the largest value for the specified numeric field across all documents.

<ParamField body="field">The field name to compute the maximum on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Stats

A stats aggregation provides a collection of statistical metrics for the specified numeric field, including count, sum, average, min, and max.

<ParamField body="field">The field name to compute the stats on.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

### Percentiles

The percentiles aggregation calculates the values below which given percentages of the data fall, providing insights into the distribution of a dataset.

<ParamField body="field">
  The field name to compute the percentiles on.
</ParamField>
<ParamField body="percents" default={[1.0, 5.0, 25.0, 50.0, 75.0, 95.0, 99.0]}>
  The percentiles to compute.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the percentiles as a hash map.
</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field. By default, missing values
  are ignored.
</ParamField>

## Bucket Aggregations

### Histogram

Histogram is a bucket aggregation where buckets are created dynamically based on a specified interval. Each document value is rounded down to its bucket. For example, if you have a price of 18 and an interval of 5, the document will fall into the bucket with the key 15. The formula used for this is: `((val - offset) / interval).floor() * interval + offset`.

<ParamField body="field">The field to aggregate on.</ParamField>
<ParamField body="interval">
  The interval to chunk your data range. Each bucket spans a value range of
  [0..interval). Must be a positive value.
</ParamField>
<ParamField body="offset" default={0.0}>
  Shift the grid of buckets by the specified offset.
</ParamField>
<ParamField body="min_doc_count" default={0}>
  The minimum number of documents in a bucket to be returned.
</ParamField>
<ParamField body="hard_bounds">
  Limits the data range to [min, max] closed interval.
</ParamField>
<ParamField body="extended_bounds">
  Extends the value range of the buckets.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>
<ParamField body="is_normalized_to_ns" default={false}>
  Whether the values are normalized to ns for date time values.
</ParamField>

### Date Histogram

DateHistogramAggregation is similar to HistogramAggregation, but it can only be used with date type. Currently, only fixed time intervals are supported.

<ParamField body="field">The field to aggregate on.</ParamField>
<ParamField body="fixed_interval">
  The interval to chunk your data range. Each bucket spans a value range of
  [0..fixed_interval). Accepted values: ms, s, m, h, d.
</ParamField>
<ParamField body="offset" default={0}>
  Shift the grid of buckets by the specified offset.
</ParamField>
<ParamField body="min_doc_count" default={0}>
  The minimum number of documents in a bucket to be returned.
</ParamField>
<ParamField body="hard_bounds">
  Limits the data range to [min, max] closed interval.
</ParamField>
<ParamField body="extended_bounds">
  Extends the value range of the buckets.
</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>

### Range

Range aggregation allows you to define custom buckets for specific ranges.

<ParamField body="field">The field to aggregate on.</ParamField>
<ParamField body="ranges">A list of ranges to aggregate on.</ParamField>
<ParamField body="keyed" default={false}>
  Whether to return the buckets as a hash map.
</ParamField>

### Terms

Terms aggregation creates a bucket for every unique term and counts the number of occurrences.

<ParamField body="field">The field to aggregate on.</ParamField>
<ParamField body="size" default={10}>
  The number of terms to return.
</ParamField>
<ParamField body="segment_size" default={100}>
  The number of terms to fetch from each segment.
</ParamField>
<ParamField body="show_term_doc_count_error" default={false}>
  Whether to include the document count error.
</ParamField>
<ParamField body="min_doc_count" default={1}>
  The minimum number of documents in a term to be returned.
</ParamField>
<ParamField body="order">The order in which to return the terms.</ParamField>
<ParamField body="missing">
  The value to use for documents missing the field.
</ParamField>

## Nested Aggregations

Buckets can contain sub-aggregations. For example, creating buckets with the range aggregation and then calculating the average on each bucket:

```sql
SELECT bm25_search.aggregate('{
  "range_rating": {
    "range": {
      "field": "rating",
      "ranges": [
        { "from": 1, "to": 3 },
        { "from": 3, "to": 5 }
      ]
    },
    "aggs": {
      "average_in_range": { "avg": { "field": "rating"} }
    }
  }
}');
```
